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ABSTRACT 



The large random access memory and high internal speeds of present 
day computers can be used to increase the efficiency of large-scale simu- 
lation experiments by estimating simultaneously several quantiles of each 
of several statistics. In order to do this without inordinately increas- 
ing programming complexity, quantile estimation schemes are required which 
are simple and do not depend on special features of the distributions of 
the statistics considered. We discuss limitations, when the probability 
level a is very high or very low, of two basic methods of estimating 
quantiles. One method is the direct use of order statistics; the other 
is based on the use of stochastic approximation. 

Several modifications of these two estimation schemes are considered. 
In particular a simple and computationally efficient transformation of the 
simulation data is proposed and the properties (i.e. bias and variance) of 
quantile estimates based on this scheme are discussed. 
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INTRODUCTION 



Consider a situation in which we have a collection of random 

variables X , X , « . . , X with joint distribution F(x , « , « , x ) and 
1 n In 

several statistics (functions of these n random variables), say 

S(n) = g, (X , . . . , X ), T(n) = g,(X , . , , , X ), etc. It is required to 

estimate the distributions Fg(s), F^t), of these statistics (or some 

characteristics of the distributions) by obtaining m samples 

X . , X , , . . , X . , i = 1 , , . . , m, from F(x , , . . , x ) and hence m 
l,i^,i n,i 1 n 

values for each of the statistics S(n), T(n), * Two examples of this 

type of situation are as follows: 

i) In testing for independence in a time series {X. }, many test 

statistics have been proposed* These are functions of finite sets of the 

{X. }, namely X , * » <» , X , and the hypothesis is that F(x ,□<»., x ) 

1 1 n In 

n 

= IT , F^ (x.)* Typical statistics are the sample serial correlation co- 
1=1 X, 1 
1 

efficients with various delays (lags), i* e. 



P,(") =— r~n 



1 n-f 

.2 , (x. - 3c) (x. - x) 

n-f 1 = 1 1 ' i+f 









where x= (x +.*. + x )/n, statistics based on the finite Fourier trans- 
1 n 

form of the x *s which test essentially that the spectrum of {X. } is 
i 1 

flat, and several non-parametric test statistics such as those based on 
runs. The distributions of most of these statistics are known for inde- 
pendent, normally distributed X^*s, but not when the assumption of a 



normal distribution is removedo In testing for a renewal hypothesis 
in series of events (Cox and Lewis, 1966, p. 164) an exponential distri- 
bution for the X.’s may be reasonable^ The null distributions of the test 
1 

statistics are then unknown, as are the rate of convergence to the limit- 
ing (n—"oo) distributions (some of which are known). 

In examining and tabulating the finite sample distributions, it may 
be required to estimate the distributions of several of these test statistics 
for many different values of n* 

ii) There is much interest in analyzing very complex queueing 
and congestion systems, particularly those that arise in computing and 
communication contexts. Here one might be interested in estimating by 
simulation the distributions of the waiting times at several points in the 
system at several different times during its evolution. 

In looking at the problem of estimating these distributions from 
m replications of the statistics, several general problems arise which 
need to be considered* 

First, it is neither practical nor desirable to save all of the in- 
formation generated on a statistic by the simulation in the form of the 
empirical distribution function or, equivalently, in the form of the com- 
plete set of m order statistics* Some compact characterization of the 
distribution is required* In situations such as that of the first example 
given above, out of which this present study in fact arose, the character- 
istics of the distribution function chosen were the first four moments 



3 



and sixteen quantiles of the distribution* (A quantile x of a distribution 
F{x) is defined as the solution of the equation 

a = F( x^) , 0*0<a<U0^ 

and we shall be assuming throughout this paper that x is unique). 

The probability levels chosen for the quantiles were a = 0, 001, 0» 002, 0. 005, 
0*010,0*020, 0*025, 0*050, 0* 100, and a * 0*900, 0*950, 0* 975, 0* 980, 

0* 990, 0* 995, 0* 998, 0* 999* The choice of some of these a^s is based 
on the levels customarily used in testing statistical hypotheses; the more 
extreme values have been chosen rather arbitrarily to characterize the 
extreme tails of the distributions^ In many queueing situations it is these 
extreme values, rather than moments, which are of interest* 

Another possible characteristic is the probability of the statistic 
being less than a given value* These percentiles are clearly important 
in studies of the power of test statistics against non-null hypotheses* 

Their estimation is fairly straightforward and will not be considered here* 

A second point concerns the measurement of statistical efficiency 
by either the variance or the mean square error of the estimator. The 
mean square error is the variance of the estimator plus its bias squared. 

In large scale applications of simulations, as treated in this study, it is 
important to obtain internal assessments of the variability of the estima- 
tion procedures by, for instance, estimating a quantile as the average of 
r estimates from samples of size m , where r m = m. The sample 
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1 ^2 

standard deviation calculated from the r estimates, divided by (r ) ' , 

then estimates the sampling standard deviationof the quantile estimator (see 
Mosteller and Tukey, 1968, for more details). In order to assess the 
internal variability of the estimation in this way the bias of the estimator 
must be small compared to the standard deviation of the estimator. 
Otherwise we obtain, from one point of view, a significant bias component 
in the mean square error, or from another point of view, an estimate 
with very small sampling variance of the wrong quantity, i, e, , true 
quantile plus bias. 

Bias thus becomes a very important factor in assessing the 
quantile estimates discussed in this paper, 

A third consideration is that in some cases one can find particular 
properties of a statistic whose distribution is to be estimated that allow 
one to obtain estimates that are more "efficient** than those obtained by 
straight synthetic sampling. By **efficiency '* here we mean statistical 
efficiency , or the relative variances or mean square errors of different 
estimating procedures. However there are other less tangible costs in- 
volved in simulation. One is the time involved in deriving a particular 
procedure for a given statistic, another the time involved in programming 
and debugging such a procedure and the delay in obtaining results. Still 
another cost is the actual computing time involved though this is seldom 
mentioned in the statistical literature, (This latter point will become 
clearer later in the paper, ) These less tangible costs are an important 
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component of overall computational efficiency. The point of view taken 
in this paper is that the rapidly accelerating availability of large memory, 
high internal speed computers makes it usually more ’^efficient'*, in the 
general computational sense, to forego using special techniques for each 
particular statistic and to use very simple straightforward simulation 
techniques^, Thus a criterion for the quantile estimation techniques dis^ 
cussed here is programming and computing simplicity ^ 

We do not mean to imply by this that all sophisticated statistical 
and Monte Carlo techniques are not useful or applicable. Global techniques 
Such as jackknifed estimates (Quenouille, 1956) or the use of variance 
reduction techniques such as control variables (Gaver, 1969) can be used 
with the quantile estimation methods discussed in this papero The jack- 
knife technique will be discussed in the next section and the use of the 
quantile estimation techniques in the context of sophisticated Monte Carlo 
will be discussed elsewhere^ 

Finally, it is perhaps worthwhile to give some idea of the numbers 
envisioned in connection with the estimation procedures^ Clearly no 
scheme involving only raw simulation will work satisfactorily in estimat- 
ing a Oo 001 quantile (x^ unless the number m of replications 

involved is substantially greater than 1000» Typically in the COMPSTAT 
program (Goodman and Lewis, 1972), for which these techniques were 
developed, replications of 1 00,000 or more are common* These are not 
excessive on the IBM 360/91 on which the runs were made^ In addition, 
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the large core memory of this machine enabled us to run sampling experi- 
ments on up to 31 statistics sim\iltaneously« 

The following discussion uses as examples, for the most part, 
the extreme 0^ 001 and 0, 999 quantiles^ The techniques are still 
useful, though not as much so, for the inner quantiles^ Some discussion 
of the simultaneous estimation of, for example, all sixteen quantiles 
listed above is given^ The dependence of the utility and efficiency of the 
various quantile estimation schemes on the particular quantile or set of 
quantiles plus, for example, variations in the complexity of computation 

of the series X , <» ® o , X and the statistics S(n), T(n),«<,<, indifferent 
1 n 

problems make it difficult to be dogmatic about the relative utility of 
various quantile estimation schemes^ In addition, most of the results 
required are finite sample results^ These are difficult to obtain analytic- 
ally and expensive, as yet, to obtain computationally^, 

iio quantile estimation 

a) Overall Considerations 

Two general methods of quantile estimation are considered, one 
based on the order statistics of the sample, the other based on stochastic 
approximation (Robbins-Monro) techniques (see, g, , Robbins and 
Monro [1951], Hodges and Lehman [1956], Cochran and Davis [l965])« 

For simplicity we drop the notation which indicates de- 
pendence on n and write S(n) as S and write its distribution 
function simply as F(s). A collection of m independent random 
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variables with distribution F( s) will be denoted by S , . * * , S. , ^ , S 

1 1 m 

and the corresponding order statistics by S < ^ * < S, S, . , 

(1) (2) (i) (m) 

The order statistic estimator of the or-quantile s , where 

a = F(s ), is 
a 



^([am])’ 

where [or m] denotes the integral part of a m. Thus for a = 0.999 and 



( 2 . 1 ) 



a 



m = 10, 000, 

The stochastic approximation estimate s (m) is defined to be the 
m th value in the sequence defined by 

C 



9 (i) = ? (i-D- 

or Of 1 



1 - sgn {S.-s^(i=D } 



— a 



(i = 1,2,. . . ,m), (2. 2) 



where sgn x * 1 if x > 0 and -1 if x 0, and s { 0) is an arbitrary 



a 



initial value. If the constant C is chosen to be l/f(s ), where f ( s ) is 



a 



a 



the density associated with F(s) evaluated at the quantile s^, then the 
asymptotic variance (m-^oo) of s (m) is minimized^ In fact, 



E {s (m) } ^ s 
a a 



( 2 . 3 ) 



and 



var { s (m) } 



Q-d -a) 

mf^(s ) 
a 



( 2 , 4 ) 



Remarkably, the estimate a has the same asymptotically normal 

distribution as does s (m)o Results on rates of convergence are known 

a 

for s but not for s {m)« This will be discussed later, but significant 
a a 
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comparisons can be made on the basis of the asymptotic results and on 

known computational results^ 

Order Statistics, 

nor 

computation time (to order the m realizations of S) is pro- 
portional to m ln(m); 

computer memory required for the sorting process is proportional 
to m, (Actually, to or m if or < , (1 -or) m if a > ^ )\ 

no initial values or knowledge of F(s) is required; 
the rate of convergence of the estimate is known. 

Stochastic Approximation, 

computation time (binary comparison) is proportional to m; 
computer memory required is proportional to 2; 

initial values s (0) and values for f( s ) are needed, presumably 
ot a 

previous estimates or guesses based on prior knowledge; 

- no reliable results known on the rate of convergence of s , or 

a 

even of E(s ); 

a 

it is not necessary to know S^ exactly, only that it is greater 

A 

than or less than This can be very advantageous if com- 

putation of S is time-consuming. 

In summary, has very definite advantages over in terms 

of computational considerations. One might say that the asymptotic 
relative efficiency of s compared to "s , in terms of real time and not 

A 

sample size m, is zero. However, initial values are needed for s^, so 
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that the asymptotic results really beg the question^ Further differences 
appear in terms of finite sample properties of the estimators, and these 
are discussed next. 



Ill* QUANTILE ESTIMATION - Finite Sample Considerations for s 

It is well known (David, 1971 ) that for the ordered sample 

S/i X < < 0.0 < S.,. < ..o < S. . 

(1) (2) (i) (m) 



a 



^(s) =prob = (^][F(s)f [l-F(s)]”""^ 



(s) = f"")i [F(s)]'"^ [l-F(s)]”"'' f(s), 

(i) '' ' 

Efs } = E{S,r . J = s + o(^), 
a ([or m ] ) a \ mf 



^i) ® 



var fs ) - var {S.. + O 

" mf^s ) 

a 






(3.1) 

(3.2) 

(3.3) 
(3. 4) 



Asymptotic expansions for E {s } are known (David and Johnson 
[1954] and Clark and Williams [1958] ) , the first terms in the expansion 
being 

f'(s ) /, ^ 2[(f'(s ))^-f(s )f"(s )] \ 

^ a’ al-a) ' a'' ^ a' a ^ 2ar (1 -a) (1-2^ 

^ - 7X~T * • (mlz) (n^^S) ' ’ “ 

2 f ( s ) 5 f ( s ) 

a 



6r(s ) 

a 



i 'X ra 



where derivatives are denoted by primes and powers by arabic numerical exponents. 
No precise conditions for these asymptotic expansions seem to be known, and 
they must be used with care. For example, if S is uniformly distributed be- 
tween 0 and 1, s and, using (3, 2), E {S } *= i/( mf 1 ) =<i/m)-i/[(ni)(m+ 1)]. 

or ( i) 
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Thus if amis the integer i, we have 



a 



^ ^ “ (mfl) ’ 



but the second term of the expansion (3, 5) is zero. Similarly, in the 
extreme but computationally useful case S = » where U is uni- 

form, we have 



F(s) 



1+ s • 



(3.6) 



a distribution with an infinite mean. Surprisingly, however, s can be 



a 



shown, using (3, 2) to be an unbiased estimator of s , In this case the 



a 



expansion (3^5 ) does not even converge. 

An important consequence of (3, 3) is that the jackknife technique 
(Quenouille [1956], Mosteller and Tukey [1966]) for eliminating an O(^) 



term in the bias can be applied to the order statistic estimate s , By 



a 



way of illustration consider the technique being applied with just a simple 

splitting of the data. Thus, assume m is even and ^ (1) is the order 

statistic estimator from the first m/2 S.*s, s (2) the estimator from 

1 a 

the second m/2 S^*s. The **typical values** are defined to be 



s(l) = 2s - s(l), 
a a a 






(3.7) 



an<i the jackknifed estimate is 



s' (1) + ¥ (2) s (1) s (2) 

= or a a a 

8 =* r - 2s - 

O' 2 a 



2 



2 



2 



(3.8) 
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The 0(-— ) term in E[s ] is zero* There is no appreciable 

computing cost involved in obtaining the jackknifed estimate^ The two 

halves of the sample can be sorted in place to obtain ’s (1) and s’ (2), 

a a 

and then these sorted halves are merged to obtain the complete, sorted 

sample S, , < <S This in fact is just the usual binary sorting 

(1) (m) 

procedure which results in min (m) sorting tirne^ 

The main drawback to the jackknife procedure is that it may in- 
crease the variance of the estimator, and this variance is difficult to 
obtain analytically. We have 

var {"s } = 5 var {.'s* } - 2 cov fs "s (1) } :< 5 var fs } 

a a ^ a a ^ a \ ( 

The covariance term involves the covariance between the [a(m/2)] order 

statistic in half the sample, and the [am] order statistic in the whole 

sample. If in fact s’ (1) and "s’ (2) are approximately uncorrelated, then 

there is no increase in variance. Even if the variance is inflated enough 

to make the mean square error of the estimates approximately equal, 

there is a gain in that the smaller bias of the jackknifed estimator allows 

for sectioning the complete sample of size m into r smaller sections 

of size m^ , This gives a more precise empirical variance estimate 

and smaller computation time, the latter following since shorter sections 

are sorted. 

Unfortunately, there is some evidence (Miller, 1964) that jack- 
knifing is a poor technique to use with estimators involving extreme order 
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statistics. We give now an illustrative example. 

Example - The exponential distribution 

Consider the estimation of the 0* 999 quantile s for a unit 

7 7 / 

exponential distribution^ For simplicity we assume that the sample 
size m is such that 0^ 999 m is an integer We have 

y » F(s) « 1 - e ® , 

s x= F \y) * -In (1-y) , 



s = - In ( 1-a) , 
or 



By direct methods (Cox and Lewis, 1966) one gets 
E {T } = E 



a 



(3. 10) 



IV," -i-i “(i-)- 



var {s } = — — 

a m(l -or) 



These results are used to give Table The ratio (column 8) of 
the standard deviation ( o", column 7) of the estimate to the bias of the 
estimate (column 3) indicates roughly how feasible it is to use averages of 
estimates from samples of size m to estimate *1“^ more pre- 

cisely, along with an estimate of the sampling variance of that estimate. 

Thus 36 samples of size m 10, 000 produces an estimate with a standard 
deviation approximately equal to the absolute value of the bias, clearly an 
undesirable situation. Moreover the bias is -0<> 051, so that this estimate 
gives us accuracy of at best two decimal places^ This would not be acceptable 



in many cases 
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The leading term of the bias of the jackknifed estimate (3. 8) 
is shown in Table 2 (column 2) as a function of m. Column 3 
shows the ratio of the standard deviation of the unjackknifed esti- 
mator to this bias. Clearly, sectioning and averaging is much more 
feasible. However, as indiated in column 4 an increase by only a 
factor of 1. 03 in the variance of makes the mean square 

error of this estimate as large as that of s^ at m « 10, 000, 

s 0, 999 

Clearly, jackknifing is of greater utility in the range of m from about 
1,000 to 5, 000, if the variance does not blow up and moreover, it 
is desirable to use sections as short as this if possible, to reduce 
computation time. 

Estimates of the variance of obtained by synthetic 

sampling are given in column 5 of Table 2. The quantities in paren- 
theses are estimates (63 degrees of freedom) of the standard deviation 
of these variance estimates. There is enough increase in the vari- 
ance over the unjackknifed situation to characterize the gain 
from using the jackknife as being marginal rather than categori- 
cal with this type of quantile estimate. For the exponential case, 
however, detailed calculations on computation times, bias, etc. , 
show that in large simxjdations there would be an advantage in 
using with sections of length approximately m ■ 5, 000, 
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An alternative scheme is to use as a ^’typical value** an 

order statistic estimate, ^ from a small independent sample 

of size m/i. Then the estimate (s* - i T )/(l-i) has a mean 

a a 

value free of the 1/m term and a variance decreasing to 

var {s' } as j? increases* For i = 10 the variance is 
a 

1 . 36 var {s }; for i = 20 it is 1. 13 var {"s’ }, an in- 
a a 

crease smaller than that found for 999* However, this 
technique is difficult to use for an m much less than 10,000 
in estimating a 0. 001 quantile, 

IV, QUANTILE ESTIMATION - Finite Sample Considerations for s 

a 

Experience shov/s that for extreme quantiles, convergence 
of s (m) is so slow as to be unacceptable. Though problems 
might be anticipated in the tails of extremely skew distributions, 
e, g, , the distribution F(s) * s/(l+s) discussed above, they 
occur elsewhere too, as can be seen roughly in the following 
example. 
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Example - The unit exponential distribution 

Consider again the unit exponential and the or = Oo 001 quantile. 

This is 

® 0 . 001 " - (1 - 0 . 001 ) ~ 0 . 001 . 



We have 



^^® 0 . 001 ^ 



- 0.001 



1 . 



Assume the initial estimate is OOI^^^ ~ 0.001, and that is less 
than 0. 001. This has probability 0. 001. Then 

® 0 . 001 ^^^*^ ~ 

Clearly successive S.’s are greater than the estimates until the esti- 



mates get back to zero. After i steps the estimate will have moved 
i , 

0.001 X 1 X 1/i in the positive direction and since 



-^1 1 
.2, 0. 6772 + in(f)+ , 

the return to the origin takes about i = steps. By following through 

this example it can be seen that the estimate is almost sure to become 
negative and take an enormously long time to return to zero. 

The jackknife technique is not suitable as a means of overcoming 
this difficulty, partly because the order of the leading term in the 
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asymptotic expansion for the bias is not known, but also because if the 
correct jackknifing technique was applied (possibly, for a m term) the 

example makes it clear that the estimator would still be unsuitable. 

Two other techniques suggest themselves, Kesten [1958] seems 
to be one of the few authors to have noted the problem of long runs occur- 
ring in the stochastic approximation scheme. He suggested higher-order 
memory schemes, for example, not increasing the divisor of C in (2,2) 
by one if all p previous differences of sample values and estimators 
were of the same sign. Although this would clearly help in the example, 
it violates the need for simplicity. In fact, the divisors in the estimators 
of different quantiles for different statistics become different, and this 
creates very complex and time consuming programming procedures. 

Another technique is to bound the estimator s^(i). Thus requir- 
ing — 0 iri the example would have obviated the problem, but uses 

specific information about the statistics. Empirically derived bounds 

can be obtained at the same time that initial values s (0) and f{s ) 

a a 

are obtained. For example a small pilot run using order statistics 
will give these initial estimates and bounds for outer quantiles 
(, 001 and , 999) from the known properties of the order statistics. 

Inner quantiles are bounded by outer quantiles. Thus it has been found 
empirically that if A= | boiind - s"^(0) |, then when C/a> 100, the 
stochastic convergence scheme works reasonably well. It is, however, 
ponderous when compared to the quantile estimation scheme discussed 



in the following section, 
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V. THE MAXIMUM (OR MINIMUM) TRANSFORMATION 

It is natural to look for a computationally simple transformation 
of the data to overcome some of the problems of estimating extreme 
quantiles, and the following transformation appears to solve most of the 
computational and statistical problems^ 

Assume, e^ g, , that a > 1/2, and consider the first v S*s, i, e, , 

Then the distribution T^(s) of the maximum of the v S*s is 

F(s) = prob {( S.) <s} = {F(s)}^. (5.1) 

Note that 



F{ s ) = F^ ( s ) = = or* 

a a 



(5.2) 



and 



s = s 
a c 



I f 



(5^3) 



I V 

so that the or-quantile of F(s) is the same as the a = a quantiles of 
F(s), If we assume that v divides m, and m* = m/v, taking maxima 
of successive groups of v S*s gives a reduced sample of size m‘, u , 



S* s* , s* , . 

i c m 



Thus we have a reduced sample and have transformed the problem from 
estimating an extreme quantile (for level or) to estimating a more reasonable 

•y 

quantile (for level or ). For example, one might take v large enough to re- 

V 

duce the problem to estimating a median. Since a = 1/2 in this case, 
v= (In l/2)/(ln a). The consequent values of v are shown for eight a's 



in Table 3 
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Table 3 



Size of V 


. V 

for a = 


1/2 










0, 900 0. 950 


0, 97 5 


0. 980 


0. 990 


0.995 


0.998 


0.999 


6.6 13.5 


27. 4 


34. 3 


69. 0 


138. 3 


346,2 


692,8 



Some general points about the use of the transformation in quantile 
estimation follow^ 

(1) The transformation of the original sample {S. } into the reduced 
sample (Sj } is very efficient computationally since only v 
binary comparisons and one memory cell are needed to obtain 
from 

(Z) The transformation uses no special information about the 
properties of S, e, g, , S > 0. 

(3) Although transforming to the median is not necessarily optimal, 
it is known that stochastic approximation estimates of the 
median work well (Cochran and Davis ^ 1965)-, 

(4) If a < 1/Z the minimum of the v S*s is used instead, 

VI, THE MAXIMUM TRANSFORMATION: ASYMPTOTIC VARIANCE 



Using (2, 4) we now compare the asymptotic variance of a quantile 
estimate (order statistic or stochastic approximation) s"^^ from the re- 



duced sample with that of the quantile estimate s^ from the original sample. 



We have, from (5, 1), 

I(s) « V f{s) {F(s) 
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and, therefore, using (5. 3) 






v-1 



= v f(s ) {F(s ) }^ ^ 
a a 

j I 

f(s ,) = V f(s )a^ = V f(s )-^ 

a a a a 



( 6 , 1 ) 



Using these expressions in (2, 4) we have 

V V 

g (1-g ) 



var {s I } 



- [^(s ,)Y 

V g' 



(0, 5 < g < 1) 



(1 - cY') 



a (1-g) 

■ ■ " O ' 

mf^(s ) v(l-Q')a^ ^ 

a 



( 6 . 2 ) 



= var {s^} * g(o';v) . (6, 3) 

It can be shown that g(o';v) increases from 1 to infinity as v 
increases from 1 to infinity^ Moreover, for the median transformation, 
v^:? (In l/2)/(ln a), we have, as a approaches 1 



g(a;v); 



1 



0, 6932 



U 443 . 



The function g(a;v) at the value of v generating the median transforma- 
tion varies little with a, as is shown in Table 4, 



Table 4 

Variation of g(o';v) with a at median transformation 



a 


0. 9000 


0. 9990 


0.9999 


V 


7 


693 


6931 


g{a',^) 


1. 4020 


1. 4426 


1. 4426 
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Thus the statistical efficiency of the estimate is decreased by 
approximately 1/(1.443), but the speed of the maximum transformation, com- 
pared to the computations involved in, for example, the stochastic approximation 

estimate would probably make their computational efficiency about equals 
By computational efficiency we mean the relative computing times required 
to achieve the same variance^ Both estimates (reduced and non-reduced 
samples) are asymptotically unbiased,, 

The variation of g(o';v) with v for a - 0, 999 is shown in Table 5« 

Table 5 

Variation of g{o';v) with v at or = 0,. 999 



V 


10 


50 


100 


200 


500 


700 


1000 


1 

a 


0, 990 


0.951 


0. 905 


0. 819 


0. 606 


0. 496 


0. 368 


g(0, 999; v) 


1. 005 


1. 025 


1. 051 


1. 107 


1.297 


1. 448 


1, 718 



The choice of v in particular situations depends on computational 
considerations, particularly the meshing of estimates for various a*s 
for a given statistic, and although no global results can be given, we discuss 
these considerations in the next two sections. 

VIL THE MAXIMUM TRANSFORMATION: COMPUTATIONAL CONSIDERATIONS 
(i ) Order statistics 

The use of the maximum transformation with order statistic 
quantile estimates gives very little gain. Bias is not an extreme considera- 
tion here, and computations for several examples with the asymptotic ex- 
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pansion (3^5) show that there is very little change in small sample bias 
from one estimate to the other. Memory size is not much affected. For 

ot= 0,999 we require 0 , 001 m memory cells with the original sample; 

1 1 m 

using V = 693 to give ^ 1/2 and a reduced sample, we need -r m* - ^ 

2 2 693 

memory cells. The slight advantage is lost when it is noticed ( 6 , 2) that a 
larger sample size, m, is required to achieve equivalent variance. 

If the eight quantiles with a*s given in Table 3 are required, the 
unreduced sample ordering (done at one time for all eight quantiles) re- 
quires 0, 1 m memory cells. The eight quantiles require eight separate 

1 m 



reduced samples with memory requirements 7 ^ , *7 ^ , 

M I 



jl. ^ ^ ^ 

2 693 ’ 2 347 



or a total roughly equivalent to 0 , 1 m. 

The time gain from sorting smaller samples is again marginal. 
Stochastic approximation 

The greatest gain in using the maximum (or minimum) scheme 
comes from the reduction of bias with the stochastic approximation esti- 
mator (2, 2), There are other gains; the maximum operation is faster than the 
the computation (2, 2), and changing v and is very simple. One could 
do the eight quantiles of Table 3 from eight reduced samples obtained from 
v*s of 7 , 14, 28, 35, 70, 140, 350, 700, These values are close to the 
v’s given in Table 3 and all divide 7 00, Thus it is easy to compute the 
eight reduced samples simultaneously in nested loops. 

Again only a fixed number of memory locations is required. 
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A possible scheme for selecting initial values for the stochastic 

approximation estimator is the following. Let S'^^, . , . , s'^i be the 

reduced sample. Take the first three values, S'^, and order 

them as ^( 2 )» ^(3)' ^ such as to make a' ^ 1/2, use 

as ^ j (0), the initial value, and since S' . and S* . estimate the 

a 11 ) 13 ; 

— and — quantiles of F(s) estimate f(s ,) as: 

4 4 o 



2 



1 



'^r(3)"^(2) 



^ (^(2)"^(d) 

(This is the simplest of many density estimates. ) 
The stochastic approximation estimate 

l-sgn {s! - ^ i(i-l)} 



(7.1) 



s I (i) » s ,(i-l) - — 

a. CL 1 






i = 1,2,. . . , m, (7. 2) 



uses as the S, , i=l, in the equation* 

The value of C is not critical and this estimate should be well- 
behaved* The jackknife technique (3* 8) can be applied, using alternate 
values for the sub-estimates, although it is not known if the leading 

term in the bias is 0(l/m)« 

Analytical results for s^| and the jackknifed estimate , cannot 
be obtained and sampling results are given in Tables 6, 7 and 8 for S 
having a unit exponential distribution and S having the distribution 
F(s) - s/(l+s). 
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(iii) Sampling results 

Results are given in Tables 6, 7 and 8 from an extensive 
simulation to investigate the properties of quantile estimates based on 
(7. 1) and (7. 2), The columns in these tables show successively m, 
the sample size; m*, the reduced sample size; the expected values 
and standard deviations of the jackknifed estimator based on (7, 1) 

and (7. 2) and the expected values and standard deviations of the straight- 
forward estimator based on (7. 1) and (7 2). The last column 

in the Tables gives the asymptotic standard deviations from (6. 2). Note 
that V * 7 00. To indicate the precision obtained in the sampling ex- 
periment the quantities in parentheses in the last row of each table are 
estimates (7 degrees of freedom) of the standard deviations of the estimates. 

In Table 6, the minimum transformation gives estimates of 
»0 *= 0. 001001, the 0. 001 quantile of the unit exponential distribu- 

tion. The jackknifed estimator s^i, where a* »= 1 - [1-0. 001] , has 

converged by the time m reaches roughly 30, 000. There is still a 
small bias in the unjackknifed estimator , approximately one tenth 

of the standard deviation of the estimator at m >= 28, 000. There is a 
penalty paid in terms of a larger standard deviation for the jackknifed 
estimator 20%) and the standard deviation for s^^i is larger than 
predicted by the asymptotic formula (6. 2). This is due to the added 
variability introduced by the density estimate (7. 1) and the finite 



sample size m. 
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TABLE 6 



Estimation of 0. 001 quantile with minimum transformation 
and stochastic approximation, with (s^t) and without ( s^f) 
jackknife, a* = 1 -[1 -0. 001 ]^, v = 700 . Unit exponential 
distribution. 



F(s)»l-e"®; F(s) - 1 - [1 -F(s)f; s^ - 0. 001 001 



m 


t 


.) 

1 . . _ 


st. dev. (s ,) 
a 




st. dev. (s" ,) 
a* 


Asymptotic 
st. dev. 


4200 


6 


1 

1 

' 0.00107 


0. 00095 


0. 00111 


0. 00065 


0. 000587 


5600 


8 


0. 00106 


0. 00081 


0. 00109 


0. 00058 


0. 000509 


7000 


10 


0. 00104 


0. 00073 


0. 00108 


0. 00053 


0. 000455 


8400 


12 


0. 00103 


0. 00067 


0. 00107 


0. 00050 


0. 000415 


9800 


14 


0. 00103 


0. 00062 


0. 00106 


0. 00047 


0. 000385 


11200 


16 


0. 00102 


0. 00058 


0. 00106 


0. 00045 


0. 000360 


12600 


18 


0. 00102 


0. 00055 


0. 00105 


0. 00043 


0. 000339 


14000 


20 


0. 00102 


0. 00053 


0. 00105 


0. 00042 


0. 000322 


16800 


24 


0. 00101 


0. 00049 


0. 00104 


0. 00040 


0. 000294 


19600 


28 


0. 001008 


0. 000456 


0. 001034 


0. 000378 


0. 000272 


28000 


40 


0. 001003 


0. 000400 


0. 001025 


0. 000344 


0. 000228 


42000 


60 


0. 000999 
(0. 000001) 


0. 000346 
(0. 000001) 


0. 001016 
(0. 000001) 


0. 000310 
(0. 000002) 


0. 000186 
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TABLE 7 



Estimation of 0. 999 quantile with maximum transforma- 
tion and stochastic approximation, with (“s i) and without 
(s ,) jackknife, o* = 0. 999^, v *= 7 00 . Un?t exponential 
distribution. 



F(s) = 1 - e"®; F(s) = [F(s)]'"; s^ = 6. 908 



m 


m‘ 


E(f ,) 
Of 


st. dev. (s' ,) 

a 


t 


st. dev. (s' i) 
or 


Asymptotic 
st. dev. 


4200 


6 


6. 946 


0. 97 0 


6. 962 


0. 640 


0. 587 


5600 


8 


6. 942 


0. 825 


6. 957 


0. 569 


0. 508 


7000 


10 


6. 931 


0. 736 


6. 949 


0. 522 


0. 455 


8400 


12 


6. 929 


0. 670 


6. 946 


0. 488 


0. 415 


9800 


14 


6. 926 


0. 619 


6. 942 


0. 461 


0. 384 


11200 


16 


6. 923 


0. 581 


6. 940 


0. 440 


0. 359 


12600 


18 


6. 920 


0. 548 


6. 936 


0. 423 


0. 339 


14000 


20 


6. 919 


0. 520 


6. 934 


0. 406 


0. 321 


16800 


24 


; 6. 919 


0. 481 


6. 933 


0. 386 


0. 293 


19600 


28 


6. 916 


0. 448 


6.929 


0. 365 


0. 27 2 


28000 


40 


6. 91? 


0. 384 


6. 923 


0. 326 


0. 227 


42000 


60 


6. 912 


0. 332 


6. 920 


0. 294 


0. 186 






(0. 001) 


(0. 001) 


(0. 001) 


(0. 001) 

1 
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TABLE 8 

Estimation of 0. 999 quantile with maximum transforma- 
tion and stochastic approximation, with ('s ,) and without 

(s" i) jackknife, of* = 0.999'^, v ■= 7 00. ^ 

a — — — ■' — 

F(s) = s/(l+s); F(s)= [s/(l+s)]'"; 999 



m 


1 

» 

m* 




St. dev. (s ,) 

a' 




St, dev. (s' ,) 
a 


Asymptotic 
St. dev. 


4200 


6 


1048 


2902 


1233 


1375 


587 


5600 


8 


1196 


17 29 


1217 


1233 


508 


7000 


10 


1076 


2416 


1197 


1195 


455 


8400 


12 


1126 


1412 


1180 


1042 


415 


9800 


14 


1063 


1422 


1163 


872 


384 


11200 


16 


1076 


1078 


1148 


80? 


359 


12600 


18 


1052 


1082 


1137 


738 


339 


14000 


20 


1060 


1123 


1127 


753 


321 


16800 


24 


1048 


87? 


1112 


685 


293 


19600 


28 


1042 


828 


1103 


698 


27 2 


28000 


40 


1025 


7 47 


1077 


624 


227 


42000 


■ 60 


1016 


551 


1055 


494 


186 






( 1) 


( 23) 


( 1) 


( 28) 
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In Table 7 the maximum transformation gives estimates of 

s = 6. 908, the 0. 999 quantile of the unit exponential distribution. 

0. 999 

The bias is smaller for the jackknifed estimator, and extrapolation of 
figures in column 3 gives convergence in the mean for most purposes 
by m * 50, 000. Note, however, that the standard deviation/ bias ratio 
is large 100) down the column. Again, there is an inflation in the 
standard deviation of the jackknifed estimator. 

The bias is examined more closely in Figure 1 where the absolute 
value of the (estimated) bias is plotted on a log scale against m for both 
estimators. The curvature indicates that higher order terms than 1/m 
are still important for these sample sizes. Except for the m * 42,000 
point, the bias in the jackknifed estimator appears to be falling off more 
rapidly than the bias for (here * 0, 999^^^). No formal regression 

analyses of these sampling results have been done. 

Note that for both estimators the absolute value of the bias (except 
for the last point) is less than the absolute value of the bias in the order 
statistic estimator. This is given by (3. 1) and plotted as a sequence of 
crosses in Figure 1. 

Bias is more serious for the extreme case of the distribution 
F(s) » s/(l+s), as shown in Table 8. The jackknifed estimator is ad- 
vantageous here where the standard deviation/ bias ratio is smaller than 
for the exponential distribution and convergence is a little slower. 



BIAS 



29 



0.10 



X 



X 



X EXACT BIAS (ORDER STATISTIC) 
o ESTIMATED BIAS (sa') 

• ESTIMATED BIAS da') 



X 

o X 
o 



X 



0 . 01 - 



0.002 



1 



10,000 



30,000 

SAMPLE SIZE, m 



50,000 



Figure 1. Exact bias for the order statistic estimator s" , of the 0. 999 quantile 



a 



of the unit exponential distribution, and estimated bias for estimators s , and 



a 



n.Qinnr f Vi rri y i rm i m f-rPins fn-rma tinn ( ry' ^ 0 QQQ v = 7 00^ and stnrhastir. 
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VIII. CONCLUSIONS AND FURTHER WORK 

The conclusion of this investigation is that the maximum trans- 
formation and the stochastic approximation scheme (7. Z) yields a 
quantile estimator for extreme quantiles (e. g. , x qqq) which is fast 
and linear in sample size m; requires a small, fixed amount of storage 
and without using external information provides a virtually bias free 
estimate even in extreme cases (such as the s/(l+s) distribution) for 
sample sizes of m = 50, 000. In most cases smaller samples will be 
satisfactory. 

There are possibilities, based on the properties of extreme 
value distributions, of improving the quantile estimators performance 
even more. 

Note too that the estimator can be used to advantage for smaller 



a than 0. 999, and it is completely suited for use with global variance 
reduction techniques such as the use of control variables (Gaver, 1969) 
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are simple and do not depend on special features of the distributions of 
the statistics considered. We discuss limitations, when the probability 
level is very high or very low, of two basic methods of estimating 
quantiles. One method is the direct use of order statistics; the other 
is based on the use of stochastic approximation. 

Several modifications of these two estimation schemes are considered, 
In particular a simple and computationally efficient transformation of the 
simulation data is proposed and the properties (i.e. bias and variance) of 
quantile estimates based on this scheme are discussed. ^ 
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